Integrated pathway mining and selection of an artificial CYP79-mediated bypass to improve benzylisoquinoline alkaloid biosynthesis

Background Computational mining of useful enzymes and biosynthesis pathways is a powerful strategy for metabolic engineering. Through systematic exploration of all conceivable combinations of enzyme reactions, including both known compounds and those inferred from the chemical structures of established reactions, we can uncover previously undiscovered enzymatic processes. The application of the novel alternative pathways enables us to improve microbial bioproduction by bypassing or reinforcing metabolic bottlenecks. Benzylisoquinoline alkaloids (BIAs) are a diverse group of plant-derived compounds with important pharmaceutical properties. BIA biosynthesis has developed into a prime example of metabolic engineering and microbial bioproduction. The early bottleneck of BIA production in Escherichia coli consists of 3,4-dihydroxyphenylacetaldehyde (DHPAA) production and conversion to tetrahydropapaveroline (THP). Previous studies have selected monoamine oxidase (MAO) and DHPAA synthase (DHPAAS) to produce DHPAA from dopamine and oxygen; however, both of these enzymes produce toxic hydrogen peroxide as a byproduct. Results In the current study, in silico pathway design is applied to relieve the bottleneck of DHPAA production in the synthetic BIA pathway. Specifically, the cytochrome P450 enzyme, tyrosine N-monooxygenase (CYP79), is identified to bypass the established MAO- and DHPAAS-mediated pathways in an alternative arylacetaldoxime route to DHPAA with a peroxide-independent mechanism. The application of this pathway is proposed to result in less formation of toxic byproducts, leading to improved production of reticuline (up to 60 mg/L at the flask scale) when compared with that from the conventional MAO pathway. Conclusions This study showed improved reticuline production using the bypass pathway predicted by the M-path computational platform. Reticuline production in E. coli exceeded that of the conventional MAO-mediated pathway. The study provides a clear example of the integration of pathway mining and enzyme design in creating artificial metabolic pathways and suggests further potential applications of this strategy in metabolic engineering. Supplementary Information The online version contains supplementary material available at 10.1186/s12934-024-02453-7.


Background
Microbial bioproduction is now widely used to produce various natural and non-natural chemicals.Production of target compounds using microbial systems offers environmental and economic advantages over conventional technologies such as chemical synthesis and plant extraction.For example, microbes have been engineered to utilize renewable resources for the synthesis of nextgeneration biofuels and plastics, as an alternative to petrochemistry [1,2].Moreover, plant natural product pathways have been engineered in microbes for the bioproduction of flavors, fragrances, and medicines at higher yields compared to that of plant extraction processes [3,4].Further advances in synthetic biology have enabled the production of artificial compounds with functional groups rarely found in nature [5].However, in order to realize the biosynthesis of more target compounds, comprehensive methods for the construction of de novo biosynthetic pathways and the selection of necessary enzymes must be systematized.
Computational methods are necessary to search for the enzymatic reaction steps that enable efficient synthetic pathways, and databases of biological reactions have exponentially increased in size [6][7][8][9].There are two general approaches for the design of synthetic pathways: metabolic network searches and chemical structure-based reaction prediction.Network searches cover known biosynthetic pathways from starting substrates to target compounds [10][11][12][13].The advantage of the metabolic network searches is high confidence for known heterologous enzymatic reactions; however, unreported biological reactions are not included.On the other hand, chemical structure-based reaction prediction can infer untested enzymatic reactions, similar to how retrosynthesis is used for chemical synthesis.To enable chemical structure-based reaction prediction, reaction features are derived from chemical structures of known enzymatic reactions.Then, the features are applied to search all possible combinations of enzymatic reactions that can result in target compounds, including putative compounds and enzymatic reactions [6,[14][15][16][17].While chemical structure-based reaction prediction can uncover previously unknown enzymatic reactions, the number of possible solutions rapidly increases with the amount of database reactions and compounds [6].
We previously constructed the computational platform M-path for chemical structure-based searching [6].M-path controls the search space via an iterative random algorithm and suggests putative metabolic pathways, including artificial pathways, to target chemicals.As for M-path dataset, chemical structures were decomposed into lists of atom and bond types to create feature vectors of 318 atom and bond feature types, for example, the numbers of primary, secondary and tertiary carbons were counted, and each covalent bond in a structure was recorded as pairs of atom types.Additionally, enzymatic reaction data are from KEGG (Kyoto Encyclopedia of Genes and Genomes), such as the KEGG reactant-product reaction pairs extracted from the KEGG RPAIR database.Then, the chemical structures were converted for each of these pairs as chemical feature vectors.M-path algorithm is based on linear programming to find possible combinations of reaction feature vectors, which sum to produce a desired pathway feature vector.Once possible combinations of reaction features are obtained, pathways are made by ordering the reaction feature vectors, then matching intermediates to each pathway.Finally, a scoring method was performed by chemical similarity comparison to rank the resulting pathways.Each step in a pathway consists of a list of possible enzymatic reactions (multiple candidates with the same reaction feature vector) and a list of possible reaction intermediates (multiple candidates with the same chemical feature vector).The chemical similarity score is calculated for every combination of enzymatic reaction step and intermediate.However, experimental validation to ascertain whether M-path can propose novel metabolic pathways beneficial for metabolic engineering remains experimentally unexplored [4].
To showcase the practical utility of M-path for artificial pathway mining and enzyme design, we tried to enhance the production of benzylisoquinoline alkaloids (BIAs) based on the computational method.BIAs include various pharmaceutical compounds, such as berberine (antidiarrheal and anticancer), sanguinarine (antibacterial and anticancer), morphine (analgesic), and codeine (antitussive) [18,19].Microbial bioproduction of BIAs offers advantages in production cost, environmental sustainability, and process control [20].Microbial BIA bioproduction has been achieved in yeast via a phenylpyruvate decarboxylase-dependent pathway through the aryl acetaldehyde intermediate 4-hydroxyphenylacetaldehyde (4HPAA) with norcoclaurine (higenamine) as the first committed BIA intermediate [21][22][23][24], and in Escherichia coli via a monoamine oxidase (MAO)-dependent pathway through the aryl acetaldehyde intermediate The study provides a clear example of the integration of pathway mining and enzyme design in creating artificial metabolic pathways and suggests further potential applications of this strategy in metabolic engineering.
In the present study, we improved the bioproduction of BIAs by introducing a new pathway predicted by M-path and showed a prime example of how computational prediction of enzymes can rapidly improve the bioproduction of valuable target chemicals.M-path predicted a 3,4-dihydroxyphenylacetaldoxime (DHPAAoxime)-containing pathway, which bypasses the toxic MAO-and DHPAAS-mediated pathways.In DHPAAoxime-containing pathway, L-DOPA can be converted to DHPAA-oxime in an apparent peroxide-independent route.Implementing the bypath route improved the production of the essential BIA intermediate reticuline to 60 mg/L, 3-fold higher than that of the conventional MAO-mediated pathway.This rational metabolic engineering is a typical example of the effectiveness of the M-path computational workflow, as validated in heterologous E. coli bioproduction systems.

Comprehensive mining of alternative pathways
Pathway mining was performed using the M-path webbased version according to our original report [6].The search query was the reaction from L-DOPA (C00355) to DHPAA (C04043).The dataset used for mining is indicated below.Chemical data are from both KEGG (17,091 compounds) and PubChem (47,686,910 compounds) databases, and enzymatic reaction data are from KEGG (9,097 reactions).We select reactions and compounds in the pathway candidates for compounds with higher Scores (0.7 and 0.8 or larger) to integrate all pathway data.

Enzyme selection and design
Orthologous genes were evaluated via phylogenetic analysis.The threshold of homology analysis was set at over 1030 and similarity over 39%.Phylogenetic analysis was performed by ClustalX using the Neighbor-Joining method [31,32].The hydrophobic N-terminal region was predicted as a transmembrane feature of the enzyme structure using SOSUI [33].

Gene cloning
All sequence information used in this study is shown in Table S1.All synthetic genes were optimized for E. coli codon usage, purchased from GenScript (NJ, USA) and cloned into pET23a (NdeI-BamHI sites).The optimized sequences are shown in Fig. S1.Constructed plasmids are shown in Table S2.All molecular techniques were performed according to the standard protocols and a previous study [30], with details shown in the supplementary materials and methods section.All primers used in the current study are listed in Table S3.

Strain construction
The parental E. coli strain used in this work was BL21(DE3) (supplied from Novagen).

CYP79-mediated DHPAA-oxime bioproduction
Experimental conditions were based on previously reported methods [4,30].Cells were cultured overnight in 4 mL of LB medium supplemented with appropriate antibiotics at 25 o C. Overnight cultured cells were inoculated to an initial density of 0.05 (wavelength 600 nm) in 4 mL TB medium (12 g/L tryptone (Difco), 24 g/L yeast extract (Difco), 9.4 g/L K 2 HPO 4 , 2.2 g/L KH 2 PO 4 , and 4 mL/L glycerol) supplemented with appropriate antibiotics in a test tube at 25 o C. Isopropyl β-Dthiogalactopyranoside (final concentration 300 µM) and glucose (final concentration 3%) were added for induction and bioproduction at 15 h post inoculation.

Reticuline bioproduction
Alkaloid bioproduction conditions also followed the previously reported methods [4,30].Overnight cultured cells were inoculated to an initial density of 0.05 (wavelength 600 nm) in 30 mL TB medium supplemented with appropriate antibiotics in shaking flasks with baffles at 25 o C.This temperature is followed via the previous method [4,30], and assumed to be better condition for expression of each genes and function for reticuline production.Isopropyl β-D-thiogalactopyranoside (final concentration 300 µM) and glucose (final concentration 3%) was added for induction and bioproduction at 15 h post inoculation.In the case of reticuline bioproduction with dopamine addition, dopamine was added at a final concentration of 5 mM at the time of induction.

CE-MS analysis of DHPAA-oxime
DHPAA-oxime was synthesized according to previously reported methods [34,35].Filtered DHPAA-oxime sample was diluted in a methionine sulfone solution before CE-MS analysis using an Agilent G7100 CE system and an Agilent G6224AA LC/MSD TOF system, as described previously [36].Mass spectral peaks were identified using MassHunter Workstation versions 10.1 and B.06.00, respectively.

Reaction prediction and pathway design
Alternative routes to DHPAA are predicted by the computational platform M-path to increase precursor supply for reticuline bioproduction.From the query reaction of L-DOPA (C00355) to DHPAA (C04043), one of the predicted reactions is a two-enzyme pathway through DHPAA-oxime, an intermediate that has not been previously reported in natural metabolism (Fig. 2A).M-path calculates Tanimoto scores as a similarity index between candidate compounds and corresponding compounds from known enzymatic reactions.EC 1.14.14.36 (tyrosine N-monooxygenase) and EC 1.2.3.1 (aldehyde oxidase) are predicted as candidates for the two-step pathway, with tanimoto scores of 0.83 and 0.37, respectively (Fig. 2B).Tyrosine N-monooxygenase mediates conversion of L-tyrosine (C00082) to 4HPAA-oxime (C04353).Aldehyde oxidase is reported to mediate conversion of acetophenone oxime (CID5464950) to acetophenone (C07113) [37].These two enzymes are accordingly selected for de novo bioproduction of DHPAA and downstream reticuline.

Selection of optimal tyrosine N -monooxygenase candidates
Tyrosine N-monooxygenase is classified in the CYP79 family as CYP79A1 [38].The reported CYP79A1-mediated reaction is visualized in Fig. 3A, where tyrosine, oxygen and heme cofactor react to produce 4HPAA-oxime, carbon dioxide, and water.The phylogenetic tree of the CYP79 family is shown in Fig. 3B, and reference data for CYP79 is shown in Table S5.CYP79A1, CYP79B1 [39], CYP79D6v4 [40], CYP79D62 [41], all with known N-monooxygenase activity and specificity for tyrosine, are included as additional candidates.The selected sequences were designed for expression in E. coli.Additionally, N-terminal truncations of each sequence are included, based on hydrophobic site prediction by SOSUI [33].It is reported that truncation of N-terminal hydrophobic sequence of the cytochrome P450 enzyme, rat hepatic cholesterol 7 alpha-hydroxylase, results in increased expression in E.coli (10-fold higher) [42].Therefore, we expect that N-terminal truncation of cytochrome P450 increases the amount of CYP79A1.The truncated enzymes are referred to as CYP79A1N, CYP79B1N, CYP79D6v4N, and CYP79D62N.As a result, eight strains are constructed for DHPAA-oxime production from L-DOPA (Table S4).
The product, DHPAA-oxime, is indicated by LC-MS analysis (SIM m/z 168) of the culture supernatants of DK060 expressing CYP79A1, and DK064 expressing CYP79A1N (Fig. 3C & D); DK060 and DK064 additionally express genes for enhanced L-DOPA production and CYP79 function.The DHPAA-oxime standard was verified by high-resolution CE-MS analysis (detection of m/z 168.0655).DHPAA-oxime was detected as a distinct peak, clearly distinguishable from 4HPAA-oxime, when analyzed using CE-MS.Furthermore, a LC-MS/MS MRM analysis of DHPAA-oxime production could be performed based on the synthesized standard (Fig. S7).The LC-MS/MS MRM chromatogram of DHPAA-oxime is shown in Fig. S7E, and LC-MS/MS MRM peak areas for DHPAA-oxime are used to compare production levels (Fig. S7G).Extracted chromatograms for DHPAA-oxime production are shown in Fig. 3D, and those of the other strains are shown in Fig. S8.Here, only CYP79A1 exhibits L-DOPA N-monooxygenase activity, with the truncation of the hydrophobic site leading to higher DHPAA-oxime production.

CYP79AD1 mediates reticuline bioproduction through DHPAA-oxime
In the in vivo system, DHPAA-oxime is readily converted to DHPAA, presumably through aldehyde oxidase or a related protein that is natively expressed and functionalized in E. coli.Therefore, reticuline could be produced via heterologous expression of CYP79A1 in the DHPAAS module (Fig. 4).Reticuline is identified via MRM analysis, where the MS/MS fragments match that of the standard compound (Fig. 4C).Production of reticuline by strain DK083, expressing CYP79A1, and strain DK094, expressing CYP79A1N, is observed after 48 h of fermentation from glucose with added dopamine (Fig. 4D).Here, DK083 produces approximately 18 mg/L reticuline, and DK094 produces approximately 50 mg/L reticuline.Therefore, the truncation of the N-terminal hydrophobic site leads to higher production of reticuline.

Time course production of reticuline in a complete pathway from glucose
Three complete reticuline producing strains are compared in a time course analysis.DK126 is the strain expressing CYP79A1N, MT401 is the strain expressing CYP79A1, and DK138 is the strain containing the conventional pathway with MAO.As a result, DK126 produces approximately 65 mg/L reticuline, with 3-fold higher production than that of conventional DK138 (Fig. 5).Then, the concentrations of tyrosine, L-DOPA, dopamine, and THP were analyzed using LC-MS (Fig. S9).No significant differences were observed in the time course data for tyrosine and L-DOPA.On the other hand, in the MAO-expressing strain, dopamine was significantly accumulated and THP was significantly lower.Again, the truncation of the CYP79A1 hydrophobic site leads to improved reticuline production.

Discussion
In this study, we aimed to enhance BIA bioproduction through a novel pathway presented by a pathway design tool, M-path.The pathway bypasses the conventional route mediated by MAO and DHPAAS without generating hydrogen peroxide as a byproduct.Implementing this bypass pathway resulted in a significant increase in the production of the key BIA intermediate reticuline.This improvement is probably attributed to the reduced cytotoxicity achieved by avoiding the formation of hydrogen peroxide.This study highlights the usefulness of M-path in the field of metabolic engineering.
To design metabolic pathways, the tradeoff between computational feasibility and chemical reaction data size must be considered.M-path can avoid the combinatorial explosion because M-path uses an iterative random approach and linear programming.The random nature of M-path enables the suggestion of metabolic pathways even if the total amount of possible metabolic pathways cannot be exhaustively enumerated within the available computational time [6].In related studies, reactions or compounds were limited to a small number to  [14,17,43,44].Therefore, rare or poorly characterized enzymatic reactions, and unknown metabolites are not likely to be suggested by these methods.In contrast, M-path can readily suggest untested artificial pathways with potential to relax bioproduction bottlenecks.
Our previous study reported a synthetic pathway for reticuline bioproduction using the insect enzyme 3,4-dihydroxyphenylacetaldehyde synthase (DHPAAS) to convert L-DOPA to DHPAA [4].Monte Carlo simulation suggested that the DHPAAS-mediated pathway might reach higher productivity than that of the MAO-mediated pathway; however, these pathways were not directly compared in vivo.To overcome this limitation, the current CYP79-mediated pathway is directly compared to the conventional MAO-mediated pathway, with observed reticuline titers of 65 mg/L and 20 mg/L, respectively.Here the MAO-mediated reticuline titer is at the same level as that of previous reports [4].Moreover, the previous application of DHPAAS only afforded approximately 0.2 µM reticuline (65.8 µg/L) in vivo.Hydrogen peroxide is a byproduct of both DHPAAS and MAO.Therefore, cytotoxicity and oxidation of pathways intermediates by hydrogen peroxide might contribute to lower reticuline titers in the MAO-and DHPAA-containing pathways, while the lack of hydrogen peroxide production by the CYP79-mediated pathway might be a contributing factor to the improved reticuline titer.Our results of reticuline production suggest the CYP79-mediated pathway may outperform the MAO pathway, however, a comprehensive comparison of all pathway intermediates and hydrogen peroxide must be performed in future studies.
Bioproduction pathway design should carefully account for toxic byproducts, as well as factors such as reaction energy, enzyme activity, and enzyme engineering.Parameters such as Kcat, Km, and specific activity for MAO and CYP79 towards aromatic amines and amino acids, respectively, have been documented in literature sources and databases like BRENDA, particularly for MAO.For example, the Km and Kcat of Micrococcus luteus MAO towards tyramine are 170 µM and 20.8 s − 1 , respectively [45].On the other hand, while low Km values of CYP79 have been reported for L-tyrosine (Table S6), quantifying the amount of pure CYP79 remains challenging, which limits detailed kinetic analysis [46].To conduct a comparative analysis of enzyme activities, it would be necessary to establish a robust purification method for CYP79 and create an experimental setup that ensures fair conditions for activity comparisons with MAO.
The selection and design of specific enzyme sequences has remained as a major issue, even though pathway design tools are now established.In the first step of this study, M-path suggests EC 1.14.14.36 (tyrosine N-monooxygenase) as the mediator of DHPAA-oxime production from L-DOPA.N-Monooxygenase enzymes that convert amino acids to aldoximes are known to be involved in the initial reactions to plant secondary metabolites, cyanogenic glycosides and glucosinolates [47,48].Various N-monooxygenases have been identified from plants: Sorghum bicolor CYP79A1 [49], Sinapis alba CYP79B1 [39], Triglochin maritima CYP79E1 and CYP79E2 [50], and Taxus baccata CYP79A118 [51].Additional CYP79 enzymes have been reported to recognize tyrosine: Erythroxylim coca CYP79D62 [41], Populus trichocarpa CYP79D6v3 [40], and Populus nigra CYP79D6v4 [40].CYP79E1 was reported to have no specificity for L-DOPA.Despite extensive studies on CYP79, there are no previous reports of N-monooxygenase-mediated conversion of L-DOPA to DHPAA-oxime [50].
To better inform enzyme selection, phylogenetic analysis is performed in addition to following previous reports (Fig. 3).The CYP79 tree does not contain distinct groups with clear differences in substrate specificity, so diverse sequences needed to be selected to cover high variation.CYP79B1 is selected based on its annotation as tyrosine N-monooxygenase and phylogenetic grouping with CYP79A1.CYP79D62 and CYP79D6v4 are selected based on their reported activities towards tyrosine while representing distinct CYP79 members not annotated as tyrosine N-monooxygenase.As a result, CYP79A1 is confirmed to convert L-DOPA to DHPAA-oxime.Further studies should be performed to comprehensively characterize the functional range of the CYP79 family of enzymes, including detailed comparison of CYP79 activities towards tyrosine and L-DOPA [52].
While DHPAA-oxime has not been reported in nature and may be considered a non-natural compound, BIA might be produced via 4HPAA-oxime or DHPAA-oxime in some plants harboring CYP79.MS analysis detected DHPAA-oxime in the CYP79A1 data (refer to Fig. S8).The two peaks observed in the extracted chromatogram are presumed to represent the (E) and (Z) stereoisomers.This observation aligns with previous studies demonstrating CYP79A1-mediated production of (E)-4HPAAoxime and (Z)-4HPAA-oxime from L-tyrosine [53], as well as other instances of CYP79-mediated production of aldoxime isomers [52,54].While further investigations are necessary to validate the existence of DHPAA-oxime in plants, the predictions generated by M-path hold promise for pathway engineering through previously unexplored intermediates.
The current study focused on tyrosine N-monooxygenase, rather than aldehyde oxidase which was also suggested by M-path as a partner enzyme.This is because the constructed strains did not require heterologous expression of aldehyde oxidase as the native E. coli complex of paoA-paoB-paoC has been identified as aldehyde oxidase [55,56].PaoABC is known to exhibit specificity towards a wide range of substrates, and paoABC expressed in the absence of PaoD is inactive due to lack of molybdenum cofactor [56].However, disruption of paoD results in the same reticuline level as that of the strain with active paoD (Fig. S10).Moreover, overexpression of paoABCD was evaluated using various plasmids, where reticuline productivities of all strains remain at the same level (Fig. S11).Potential enhancements in production could be realized by implementing a specific aldehyde oxidase or a chemical mechanism responsible for the conversion of DHPAA-oxime to the crucial arylacetaldehyde intermediate DHPAA.It is also expected that the M-path algorithm will be further improved based on the outcomes of such further experiments.This study lays the groundwork for subsequent research on the application and refinement of M-path in the research field of metabolic engineering.

Conclusions
The prediction, construction and application of an arylacetaldoxime-containing pathway to the key arylacetaldehyde intermediate DHPAA is demonstrated to result in improved reticuline production.This alternative pathway is discovered by the computational platform M-path and evaluated in an E. coli bioproduction system expressing CYP79.The resulting arylacetaldoxime bypass pathway relaxes the arylacetaldehyde production bottleneck to reticuline while presumably avoiding production of toxic hydrogen peroxide, and leads to improvements in growth and reticuline production.Reticuline production using CYP79 could reach 60 mg/L, 3-fold higher than that of the conventional MAO-mediate pathway in a flask scale.While it is generally accepted that yeast is the preferred host for CYP450 expression, the current results emphasize that E. coli can also successfully host CYP450-mediated pathways.This study stands as a prominent example of integrating pathway mining and enzyme design to actualize artificial metabolic pathways.Through the accumulation of multi-omics data and experimental results, this strategy will be refined as a more rational approach to metabolic engineering.

Fig. 2 (
Fig. 2 (A) Assignment of putative reaction paths.After input of query compounds (L-DOPA and DHPAA), the query reaction feature is calculated to search for putative reactions with compounds that possess similar reaction features.(B) A chemical similarity score is used to rank pathways.Each score is calculated as an average Tanimoto coefficient of reaction similarity.The left panel indicates the putative reaction from L-DOPA to DHPAA-oxime, which is inferred based on the enzymatic reaction from L-tyrosine to 4HPAA-oxime (EC1.14.14.36).The right panel indicates the putative reaction from DHPAAoxime to DHPAA, which is inferred based on the similar enzymatic reaction from acetophenone oxime to acetophenone (EC1.2.3.1)

Fig. 3 (
Fig. 3 (A) Reaction formula of CYP79A1.(B) Phylogenetic tree of the CYP79 family.Sequences shown in red were tested in the current study.(C) Constructed strains and plasmid maps.(D) Typical extracted chromatograms of CYP79A1 expressing strain DK060, CYP79A1Ncut expressing strain DK064, and the control strain DK068 (MAO-containing strain).The two peaks of DK060 and DK064 represent structural isomers of DHPAA-oxime

Fig. 5
Fig. 5 Time course reticuline bioproduction in a complete pathway from glucose and growth curve of each strain.Blue squares represent the data of strain MT401 (CYP79A1 used); red circles represent the data of strain DK126 (N-terminal truncated CYP79A1); yellow triangles represent the data of control strain DK138